Researchers Launch LPM1.0 Model: Achieving Real-Time Interactive Digital Human Video from a Single Image
The release of the LPM1.0 model enables real-time generation of videos showing a person speaking, listening, and singing based on a single reference image. Its core breakthrough lies in multimodal processing, which can synchronously integrate text, audio, and images to generate dynamic scenes with accurate lip synchronization, subtle expressions, and natural emotional transitions. The model supports integration with mainstream speech AI systems such as ChatGPT, upgrading traditional voice conversations into real-time interactive experiences with visual feedback.